In this lab, you will try to create an Availability Group that fails at the join database to the availability group step. The goal is to try and identify the cause of the failure and fix the issue.
At the end of this lab, you will be able to:
45 minutes
Use the following credentials to login into virtual environment
Before starting the training module, we recommend that you launch the labs and give them some time to stabilize. Please be aware that sometimes the AG may be in a resolving state and AG Replicas may be in a disconnected state. This is a platform issue and should stabilize after a few minutes
Before you begin with the first exercise in the lab, let's review the lab environment.
In the lab, you have one Domain Controller and 3 nodes + 1 client computer.
AlwaysOnN1 and AlwaysOnN2 nodes are in the primary Datacenter.
AlwaysOnN3 is in the secondary datacenter.
For this lab, both the datacenters are in the same subnet.
Each node has Windows Server 2022 O/S installed.
SQL Server 2022 Standalone instances are installed on all the 3 nodes (i.e. AlwaysOnN1, AlwaysOnN2, AlwaysOnN3).
SQL Service logon account details
In this exercise, you will learn how to Create an Availability Group between 3 replicas.
Create Availability Group using the New Availability Group wizard
In this task, you will use the SQL Server management studio to create an Availability Group. Perform this task on the virtual machine AlwaysOnClient.
Open SSMS and Connect to AlwaysOnN1, AlwaysOnN2 and AlwaysOnN3 SQL Instances. Review the databases on all the 3 instances. We will try to create an Availability Group for the AdventureWorks database on the AlwaysOnN1 (Primary Replica). There are no Availability Groups deployed yet. The replica databases have been initialized on the AlwaysOnN2 and AlwaysOnN3 instances.
On the primary instance, right click Availability Group and click on the New Availability Group wizard.
In the specify Availability Group options page, provide the AG name and click Next.
In the select databases page, select AdventureWorks and click Next.
In the specify replicas page, click on the Add Replica.. button.
In the connect to server page, enter AlwaysOnN2 in the server name box and click connect.
Repeat the step to connect to AlwaysOnN3 replica.
Enable the Automatic Failover option for AlwaysOnN1 and AlwaysOnN2, the Availability Mode will automatically switch to Synchronous commit from Asynchronous commit.
Review the other settings including the endpoints, Backup preferences, Listener, Read-Only Routing tabs. For the purpose of this lab, we won't be creating a Listener or setting up Read-Only routing.
Click Next. A message appears, review the message as this will help in troubleshooting the AG creation issue later. Click Yes on the pop-up message.
In the Select Initial Data Synchronization page, choose Join only option as we have already initialized the database on the replicas. Click Next.
Review the validation page and click Next.
Review the summary page and click Finish.
On the progress page, you will notice the wizard will wait on the Joining 'AdventureWorks' to availability group 'AGCorp' step.
After a few minutes if the wizard does not complete then click on Cancel, a message will show up. Review the message and click on Yes.
If the wizard does not close even after cancelling the current action, then try shutting down SSMS itself and open SSMS again.
After cancelling the operation, the AG will be created but the replicas won't be joined.
Feel free to delete the AG and recreate it as you try various options to fix the issue.
You have successfully completed this exercise. Click Next to advance to the next exercise.
In this exercise, you will learn how to Identify the cause of the failure and fix the issue.
This is a non-guided activity and the attendees are expected to try and troubleshoot this issue on their own.
You can use any resources (including the internet or your own scripts), to troubleshoot the issue.
You can use the tools discussed in the first module to help troubleshoot the issue.
The possible causes discussed earlier in the lesson can be used as guidance for troubleshooting.
The instructor will discuss the troubleshooting steps, cause and solution in detail after this lab session.
You might have to login directly on the individual nodes to troubleshoot the issue.
Ask yourself the below questions:
Use some commands that could be used for troubleshooting the issue.
This is not a complete list of commands/tools to help troubleshoot this issue. There are various ways/methods/approaches to troubleshoot an issue. These commands/tools referenced here could be used to look up/identify useful information for this lab.
SQL Service logon account details:
To detect if the endpoints are created or what port number is being used, query for the database_mirroring endpoints at the primary and the secondary replicas. To be run in SQLCMD mode: SSMS - Query - SQLCMD mode.
SQL:Connect AlwaysOnN1 select name, state_desc, port from sys.tcp_endpoints where type_desc='DATABASE_MIRRORING' go :Connect AlwaysOnN2 select name, state_desc, port from sys.tcp_endpoints where type_desc='DATABASE_MIRRORING' go :Connect AlwaysOnN3 select name, state_desc, port from sys.tcp_endpoints where type_desc='DATABASE_MIRRORING' go
Command to create end point, if it does not exist:
SQL:Connect AlwaysOnN1 create endpoint [Hadr_endpoint] state=started as tcp (listener_port = 5022, listener_ip = all) for database_mirroring (role = all, authentication = windows negotiate, encryption = required algorithm aes) go
To detect if the endpoints are started, query them at the primary and the secondary.
SQL:Connect AlwaysOnN1 select name, state_desc, port FROM sys.tcp_endpoints where name='hadr_endpoint' go :Connect AlwaysOn2 select name, state_desc, port from sys.tcp_endpoints where name='hadr_endpoint' go :Connect AlwaysOn3 select name, state_desc, port from sys.tcp_endpoints where name='hadr_endpoint' go
If you find an endpoint is not running, start it:
SQL:Connect AlwaysOnN1 alter endpoint [Hadr_endpoint] state = started go :Connect AlwaysOnN2 alter endpoint [Hadr_endpoint] state = started go :Connect AlwaysOnN3 alter endpoint [Hadr_endpoint] state = started go
Command to view SQL Error logs:
SQL:Connect AlwaysOnN1 Exec Xp_readerrorlog go :Connect AlwaysOnN2 Exec Xp_readerrorlog go :Connect AlwaysOnN3 Exec Xp_readerrorlog go
Command to query replica states on the secondary.
Query the sys.dm_hadr_availability_replica_states for the last_connect_error_number which may help you diagnose the join issue.
Depending on which replica was having difficulty communicating, you should query both the primary and secondary for the local replica:
SQLselect r.replica_server_name, r.endpoint_url, rs.connected_state_desc, rs.last_connect_error_description, rs.last_connect_error_number, rs.last_connect_error_timestamp from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r on rs.replica_id=r.replica_id where rs.is_local=1
To look at firewall settings, on each node you can search for firewall:
You have successfully completed this exercise. Click Next to advance to the next lab.